22 research outputs found

    Learning to Extract Motion from Videos in Convolutional Neural Networks

    Full text link
    This paper shows how to extract dense optical flow from videos with a convolutional neural network (CNN). The proposed model constitutes a potential building block for deeper architectures to allow using motion without resorting to an external algorithm, \eg for recognition in videos. We derive our network architecture from signal processing principles to provide desired invariances to image contrast, phase and texture. We constrain weights within the network to enforce strict rotation invariance and substantially reduce the number of parameters to learn. We demonstrate end-to-end training on only 8 sequences of the Middlebury dataset, orders of magnitude less than competing CNN-based motion estimation methods, and obtain comparable performance to classical methods on the Middlebury benchmark. Importantly, our method outputs a distributed representation of motion that allows representing multiple, transparent motions, and dynamic textures. Our contributions on network design and rotation invariance offer insights nonspecific to motion estimation

    Accuracy of Anthropometric Measurements by a Video-based 3D Modelling Technique

    Get PDF
    The use of anthropometric measurements, to understand an individual’s body shape and size, is an increasingly common approach in health assessment, product design, and biomechanical analysis. Non-contact, three-dimensional (3D) scanning, which can obtain individual human models, has been widely used as a tool for automatic anthropometric measurement. Recently, Alldieck et al. (2018) developed a video-based 3D modelling technique, enabling the generation of individualised human models for virtual reality purposes. As the technique is based on standard video images, hardware requirements are minimal, increasing the flexibility of the technique’s applications. The aim of this study was to develop an automated method for acquiring anthropometric measurements from models generated using a video-based 3D modelling technique and to determine the accuracy of the developed method. Each participant’s anthropometry was measured manually by accredited operators as the reference values. Sequential images for each participant were captured and used as input data to generate personal 3D models, using the video-based 3D modelling technique. Bespoke scripts were developed to obtain corresponding anthropometric data from generated 3D models. When comparing manual measurements and those extracted using the developed method, the accuracy of the developed method was shown to be a potential alternative approach of anthropometry using existing commercial solutions. However, further development, aimed at improving modelling accuracy and processing speed, is still warranted

    Zero-Shot Task Transfer

    No full text
    In this work, we present a novel meta-learning algorithm TTNet1 that regresses model parameters for novel tasks for which no ground truth is available (zero-shot tasks). In order to adapt to novel zero-shot tasks, our meta-learner learns from the model parameters of known tasks (with ground truth) and the correlation of known tasks to zeroshot tasks. Such intuition finds its foothold in cognitive science, where a subject (human baby) can adapt to a novel concept (depth understanding) by correlating it with old concepts (hand movement or self-motion), without receiving an explicit supervision. We evaluated our model on the Taskonomy dataset, with four tasks as zero-shot: surface normal, room layout, depth and camera pose estimation. These tasks were chosen based on the data acquisition complexity and the complexity associated with the learning process using a deep network. Our proposed methodology outperforms state-of-the-art models (which use ground truth) on each of our zero-shot tasks, showing promise on zeroshot task transfer. We also conducted extensive experiments to study the various choices of our methodology, as well as showed how the proposed method can also be used in transfer learning. To the best of our knowledge, this is the first such effort on zero-shot learning in the task space

    Dynamic texture recognition using time-causal spatio-temporal scale-space filters

    No full text
    This work presents an evaluation of using time-causal scale-space filters as primitives for video analysis. For this purpose, we present a new family of video descriptors based on regional statistics of spatiotemporal scale-space filter responses and evaluate this approach on the problem of dynamic texture recognition. Our approach generalises a previously used method, based on joint histograms of receptive field responses, from the spatial to the spatio-temporal domain. We evaluate one member in this family, constituting a joint binary histogram, on two widely used dynamic texture databases. The experimental evaluation shows competitive performance compared to previous methods for dynamic texture recognition, especially on the more complex DynTex database. These results support the descriptive power of time-causal spatio-temporal scale-space filters as primitives for video analysis.QC 20170512Scale-space theory for invariant and covariant visual receptive fieldsTime-causal receptive fields for computer vision and modelling of biological visio
    corecore